Social Image Parsing by Cross-Modal Data Refinement

نویسندگان

Zhiwu Lu

Xin Gao

Songfang Huang

Liwei Wang

Ji-Rong Wen

چکیده

This paper presents a cross-modal data refinement algorithm for social image parsing, or segmenting all the objects within a social image and then identifying their categories. Different from the traditional fully supervised image parsing that takes pixel-level labels as strong supervisory information, our social image parsing is initially provided with the noisy tags of images (i.e. image-level labels), which are shared by social users. By oversegmenting each image into multiple regions, we formulate social image parsing as a cross-modal data refinement problem over a large set of regions, where the initial labels of each region are inferred from image-level labels. Furthermore, we develop an efficient algorithm to solve such cross-modal data refinement problem. The experimental results on several benchmark datasets show the effectiveness of our algorithm. More notably, our algorithm can be considered to provide an alternative and natural way to address the challenging problem of image parsing, since image-level labels are much easier to access than pixel-level labels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Scale Metric Learning for Matching of Heterogeneous Multimedia Data

Heterogeneous multimedia data are widely encountered in many applications, such as photo-sketch face recognition, still image to video face recognition, cross-modality image synthesis, cross media retrieval, etc. With the ubiquitous use of digital imaging devices, mobile terminals and social networks, there are lots of heterogeneous and homogeneous data from multiple sources, e.g., news media w...

متن کامل

Layered Hypernetwork Models for Cross-Modal Associative Text and Image Keyword Generation in Multimodal Information Retrieval

Conventional methods for multimodal data retrieval use text-tag based or cross-modal approaches such as tag-image co-occurrence and canonical correlation analysis. Since there are differences of granularity in text and image features, however, approaches based on lower-order relationship between modalities may have limitations. Here, we propose a novel text and image keyword generation method b...

متن کامل

Cross-modal Common Representation Learning by Hybrid Transfer Network

DNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from largescale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very us...

متن کامل

Learning from Multiple Views of Data

Title of dissertation: LEARNING FROM MULTIPLE VIEWS OF DATA Abhishek Sharma, Doctor of Philosophy, 2015 Proposal directed by: Professor David W. Jacobs Department of Computer Science This dissertation takes inspiration from the abilities of our brain to extract information and learn from multiple sources of data and try to mimic this ability for some practical problems. It explores the hypothes...

متن کامل

Collective Deep Quantization for Efficient Cross-Modal Retrieval

Cross-modal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e.g., using an image to retrieve for texts. This paper presents a compact coding solution for efficient cross-modal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in single-modal simil...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Social Image Parsing by Cross-Modal Data Refinement

نویسندگان

چکیده

منابع مشابه

Large Scale Metric Learning for Matching of Heterogeneous Multimedia Data

Layered Hypernetwork Models for Cross-Modal Associative Text and Image Keyword Generation in Multimodal Information Retrieval

Cross-modal Common Representation Learning by Hybrid Transfer Network

Learning from Multiple Views of Data

Collective Deep Quantization for Efficient Cross-Modal Retrieval

عنوان ژورنال:

اشتراک گذاری